This report is Part 2 in a five part series in which we are exploring and analyzing ocean buoy data collected from NOAA maintained National Data Buoy Center (NDBC) stations. In Part 1 we explored ocean current observations at the NDBC Station 46087 (Neah Bay Buoy) and compared them with ocean current forecasts from a third party. Here in Part 2 we will look at meteorological (wind and wave) data from the Neah Bay Buoy and examined the potential for significant meteorological events to introduce noise in ocean current observations. In Part 3 we will introduce meteorological data for another location, NDBC Station 46088 (New Dungeness Buoy), and compare trends in wave height, period, and direction with those of the Neah Bay Buoy. We will attempt to highlight the relationship between swell events at the Neah Bay Buoy and swell events at the New Dungeness Buoy. In Part 4 we will walk through considerations and processes involved in training and testing a supervised ML model to predict the class of wave which might occur at the New Dungeness Buoy given conditions at the Neah Bay Buoy. In Part 5 we will put our final classifier model in production by supplying forecasted conditions for the Neah Bay Station and determining the predicted class of wave observed at the New Dungeness Station.
More detailed information regarding the NDBC, and the locations of buoys they maintain, can be found on their website.
In this report, we introduce meteorological data collected from NDBC Station 46087 (Neah Bay Bouy) at the West entrance to the Strait of Juan de Fuca. We explore summary statistics for mean aggregate values, noting seasonal trends in observations.
Then we introduce time series data visualization and gain further insight into seasonal trends for wind and wave data over years 2014 to 2019. We take a closer look at monthly observations for year 2016, where individual spikes in wave height, or ‘swell events’, become apparent.
Finally, we wrap up the data exploration by comparing instances of ‘erratic’ and ‘dampened’ ocean current observations, similar to those identified in Part 1 of this project, with meteorological data along the same timeline. We explore the elevated water temperatures during 2016, and their potential relation to increased marine growth. We conclude that the ocean current predictions are a fair and valid representation of how we could expect ocean current observations to behave in the absence of strong meteorological events (‘wind events’, ‘wave events’, etc).
The data we will be exploring is available for download in nicely formatted, yearly ‘.txt’ files from the NDBC website here. There are a significant number of missing observations, with the range of available data spanning from 2004 through 2019. After compiling each year into a single dataset and performing routine data cleaning steps, I chose to engineer several new features. A formal definition and description of each feature is available in the appendix of this report, and details regarding measurement techniques utilized by the NDBC can be found here.
Before we dive deep into the granularity of the data, I believe it is important to obtain a better understanding of what data is present. We can do this by aggregating data for all months to obtain monthly averages:
| Month |
Number of Observations |
Mean Wave Dir |
Mean Wave Height |
Mean APD |
Mean DPD |
Mean Wind Dir |
Mean WSPD |
Mean PRES |
Mean ATMP |
Mean WTMP |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 17879 | 248.71 | 2.41 | 7.74 | 12.08 | 143.46 | 7.70 | 1017.37 | 6.83 | 8.34 |
| 2 | 16743 | 254.41 | 2.16 | 7.66 | 11.87 | 145.15 | 6.63 | 987.68 | 6.63 | 8.17 |
| 3 | 17354 | 258.87 | 2.02 | 7.54 | 11.41 | 164.79 | 5.92 | 1015.75 | 7.47 | 8.75 |
| 4 | 16846 | 265.48 | 1.97 | 7.55 | 11.25 | 187.24 | 5.23 | 1017.27 | 8.72 | 9.66 |
| 5 | 17933 | 267.55 | 1.53 | 6.89 | 9.81 | 208.42 | 3.85 | 1017.15 | 10.54 | 10.68 |
| 6 | 16917 | 269.30 | 1.39 | 6.65 | 9.12 | 215.32 | 3.23 | 1017.47 | 11.73 | 11.39 |
| 7 | 19932 | 271.54 | 1.27 | 6.46 | 8.82 | 218.95 | 2.91 | 1017.96 | 12.53 | 11.86 |
| 8 | 20044 | 275.57 | 1.25 | 6.49 | 8.53 | 210.02 | 2.79 | 1016.60 | 12.68 | 11.95 |
| 9 | 18949 | 271.16 | 1.50 | 7.11 | 9.72 | 172.20 | 3.43 | 1016.60 | 12.30 | 11.73 |
| 10 | 21140 | 262.29 | 1.94 | 7.63 | 11.00 | 146.57 | 5.33 | 1016.13 | 10.91 | 11.31 |
| 11 | 19899 | 256.23 | 2.31 | 7.47 | 11.15 | 151.80 | 7.05 | 1015.19 | 8.99 | 10.61 |
| 12 | 19187 | 258.39 | 2.51 | 7.72 | 12.02 | 144.63 | 7.25 | 1016.26 | 6.94 | 9.09 |
We can see that for the most part there is a consistent number of observations from month to month with the range of 16743 observations for the Month of February, and 21140 observations for the month of October. There is a lot of information here, and we will use data visualization techniques to explore this further. For now, consider this aggregation of data by year:
| Year |
Number of Observations |
Mean Wave Dir |
Mean Wave Height |
Mean APD |
Mean DPD |
Mean Wind Dir |
Mean WSPD |
Mean PRES |
Mean ATMP |
Mean WTMP |
|---|---|---|---|---|---|---|---|---|---|---|
| 2004 | 4218 | 266.35 | 1.89 | 7.22 | 10.42 | 171.05 | 4.44 | 1016.30 | 11.13 | 11.37 |
| 2005 | 16072 | 270.09 | 1.89 | 7.19 | 10.75 | 173.41 | 4.71 | 1015.66 | 10.07 | 10.72 |
| 2006 | 13157 | 267.43 | 1.93 | 7.04 | 10.24 | 190.36 | 4.75 | 980.79 | 10.20 | 10.72 |
| 2007 | 17072 | 262.73 | 2.03 | 7.38 | 10.66 | 177.64 | 5.21 | 1016.88 | 9.29 | 9.92 |
| 2008 | 17029 | 261.82 | 2.14 | 7.60 | 11.17 | 184.19 | 5.04 | 1016.93 | 8.51 | 9.31 |
| 2009 | 15434 | 268.64 | 1.74 | 7.09 | 10.26 | 191.26 | 4.95 | 1016.83 | 9.36 | 9.84 |
| 2011 | 12363 | 270.41 | 1.74 | 7.16 | 10.35 | 188.45 | 4.23 | 1017.68 | 10.65 | 10.29 |
| 2012 | 12853 | 264.33 | 1.89 | 7.20 | 10.38 | 177.13 | 5.29 | 1015.15 | 9.04 | 9.68 |
| 2013 | 10249 | 263.40 | 1.89 | 7.41 | 11.29 | 164.05 | 5.67 | 1020.69 | 7.57 | 8.68 |
| 2014 | 17405 | 261.32 | 1.80 | 7.09 | 10.29 | 176.30 | 5.59 | 1016.42 | 10.07 | 10.70 |
| 2015 | 17463 | 264.50 | 1.85 | 7.28 | 10.44 | 172.09 | 5.00 | 1016.68 | 10.68 | 11.20 |
| 2016 | 17451 | 260.11 | 2.02 | 7.46 | 10.74 | 169.52 | 5.36 | 1015.63 | 10.64 | 11.23 |
| 2017 | 17332 | 258.14 | 1.76 | 7.04 | 10.10 | 168.77 | 5.43 | 1015.88 | 9.60 | 10.41 |
| 2018 | 17420 | 264.24 | 1.79 | 7.18 | 10.32 | 175.46 | 5.16 | 1017.07 | 9.97 | 10.58 |
| 2019 | 17305 | 261.97 | 1.69 | 7.25 | 10.99 | 159.91 | 4.93 | 1016.22 | 10.04 | 10.64 |
Year 2010 is missing altogether. Also, I notice that year 2004 only has about four thousand observations and year 2013 has about ten thousand. As we continue with our exploratory data analysis, it will be important to keep in mind that aggregations with less data provide a less accurate picture of what is actually happening during that period of time. The impact of having twice as much data for a given time period will allow us to glean a more accurate understanding of what is happening. In other words, be wary not to draw conclusions from comparisons between data aggregations with different levels of clarity (that is, significant differences in numbers of, and distributions of, observations).
In light of this, let’s continue with a look at Monthly Aggregate data for Wave Heights:
The months along the y-axis have been arranged in ascending order of average wave height, and values for average wave height appear along the x-axis. The size of the point corresponds to the average dominant period, while the color corresponds to the average direction for each month. Notice the distinct grouping of winter months between October and April, which tend to have larger average wave heights, larger dominant periods, and a more Southerly direction. Also notice the distinct group of summer months between May and September, which tend to have smaller average wave heights, smaller dominant periods, and a more Northerly direction. These seasonal swell patterns are common knowledge, and their presence adds validity to our data.
Consider the following display of the Yearly Aggregated Wave Height data:
The years along the y-axis have been arrange sequentially from 2004 to 2019, and values for average wave height appear along the x-axis. In this plot, the size of the point corresponds to the number of observations used to generate the aggregate means, while the color represents the average wave direction.
Now lets take a look at Monthly Aggregated Wind and Weather Data:
Here we see months arragned in ascending order of average wind speed along the y-axis and values for average wind speed along the x-axis. The size of the point corresponds to the atmospheric pressure, while the color of the point corresponds to the average wind direction for each month. Remember, it’s valid to draw conclusions from comparisons is this chart since each month has roughly an equal number of observations. Again we notice two distinct groups of winter and summer months. Winter months tend to have lower atmospheric pressure and stronger winds averaging from a more South Easterly direction, while summer months tend to have higher atmospheric pressure and lighter winds averaging from a more South Westerly direction.
Finally let’s look at Monthly Aggregated Air and Water Temperatures:
Here we see months arranged in ascending order of average water temperature along the y-axis, with values for average water temperature appearing on the x-axis. There appear to be three major groupings of months, with December through April on the lower end, November and May in the middle, and June through October showing the warmest average water temperatures. Also, there appears to be a strong positive correllation between mean water temperature and mean air temperature.
Let’s take a different approach and explore this data in a series of yearly plots. Consider these visualizations of wave data for 2016 through 2019:
The height of a recorded wave is indicated on the y-axis, with the color of the point representing the direction the wave is coming from. There is some data missing for part of the spring and early summer of 2017. In general we see two primary colors, shades of green indicating a more Southwesterly direction and shades of blue indicating a more Northwesterly direction. Consistent with our previous data aggregations we see larger swell typically occuring during the winter months.
Let’s look at the Wind and Pressure data for the same date range:
Here the y-axis shows wind speeds in m/s. The color of the point corresponds to the direction the wind is coming from, and the size corresponds to the pressure at the time of the observation. The wind directions aren’t as cleanly organized as the wave direction data, but we do see trends in winter versus summer months.
Now let’s focus our attention on monthly observations of wave data for year 2016:
Each peak on these plots corresponds with a ‘swell event’. It is interesting to note how cleanly organized some of the swell events are, in comparison to the disorganized appearance of others. I wonder how they relate to concurrent wind data.
Let’s find out: